NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Board 408: Toward Building a Human-Computer Coding Partnership: Using Machine Learning to Analyze Short-Answer Explanations to Conceptually Challenging Questions

https://doi.org/10.18260/1-2--46996

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2024, ASEE Conferences)

Full Text Available
Using Machine Learning to Analyze Short-Answer Responses to Conceptually Challenging Chemical Engineering Thermodynamics Questions

https://doi.org/10.18260/1-2--48236

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2024, ASEE Conferences)

Full Text Available
Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

https://doi.org/10.18653/v1/2023.emnlp-main.130

Shivagunde, Namrata; Lialin, Vladislav; Rumshisky, Anna (December 2023, Association for Computational Linguistics)

Language model probing is often used to test specific capabilities of models. However, conclusions from such studies may be limited when the probing benchmarks are small and lack statistical power. In this work, we introduce new, larger datasets for negation (NEG-1500-SIMP) and role reversal (ROLE-1500) inspired by psycholinguistic studies. We dramatically extend existing NEG-136 and ROLE-88 benchmarks using GPT3, increasing their size from 18 and 44 sentence pairs to 750 each. We also create another version of extended negation dataset (NEG-1500-SIMP-TEMP), created using template-based generation. It consists of 770 sentence pairs. We evaluate 22 models on the extended datasets, seeing model performance dip 20-57% compared to the original smaller benchmarks. We observe high levels of negation sensitivity in models like BERT and ALBERT demonstrating that previous findings might have been skewed due to smaller test sets. Finally, we observe that while GPT3 has generated all the examples in ROLE-1500 is only able to solve 24.6% of them during probing. The datasets and code are available on Github.
more » « less
Full Text Available
WIP: Using Machine Learning to Automate Coding of Student Explanations to Challenging Mechanics Concept Questions

Auby, Harpreet; Shivagunde, Namrata; Rumshisky, Anna; Koretsky, Milo (June 2022, ASEE 2022 Annual Conference)

This work-in-progress paper presents a joint effort by engineering education and machine learning researchers to develop automated methods for analyzing student responses to challenging conceptual questions in mechanics. These open-ended questions, which emphasize understanding of physical principles rather than calculations, are widely used in large STEM classes to support active learning strategies that have been shown to improve student outcomes. Despite their benefits, written justifications are not commonly used, largely because evaluating them is time-consuming for both instructors and researchers. This study explores the potential of large pre-trained generative sequence-to-sequence language models to streamline the analysis and coding of these student responses.
more » « less
Full Text Available
Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

https://doi.org/10.18653/v1/2022.acl-long.189

Kulshreshtha, Saurabh; Kovaleva, Olga; Shivagunde, Namrata; Rumshisky, Anna (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years and comprised of a total of around nine thousand puzzles. These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual, as well as clues that depend on the answers to other clues. We separately release the clue-answer pairs from these puzzles as an open-domain question answering dataset containing over half a million unique clue-answer pairs. For the question answering task, our baselines include several sequence-to-sequence and retrieval-based generative models. We also introduce a non-parametric constraint satisfaction baseline for solving the entire crossword puzzle. Finally, we propose an evaluation framework which consists of several complementary performance metrics.
more » « less
Full Text Available
Life after BERT: What do Other Muppets Understand about Language?

https://doi.org/10.18653/v1/2022.acl-long.227

Lialin, Vladislav; Zhao, Kevin; Shivagunde, Namrata; Rumshisky, Anna (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available

Search for: All records